Skip to content

[Metrics] move prompt_tokens_total report to main process#7982

Open
liyonghua0910 wants to merge 2 commits into
PaddlePaddle:developfrom
liyonghua0910:develop+20260602_prompt_tokens_total
Open

[Metrics] move prompt_tokens_total report to main process#7982
liyonghua0910 wants to merge 2 commits into
PaddlePaddle:developfrom
liyonghua0910:develop+20260602_prompt_tokens_total

Conversation

@liyonghua0910
Copy link
Copy Markdown
Collaborator

@liyonghua0910 liyonghua0910 commented Jun 2, 2026

Motivation

Move prompt token related metrics reporting from the API-side EngineClient to the engine main process, so prompt_tokens_total is reported from the process that owns the main metrics collector.

Modifications

  • Move prompt_tokens_total, request_prompt_tokens, and request_params_max_tokens reporting from fastdeploy/entrypoints/engine_client.py to fastdeploy/engine/common_engine.py.
  • Remove the unused main_process_metrics import from fastdeploy/entrypoints/engine_client.py.

Usage or Command

pytest tests/engine/test_common_engine.py tests/pooling/test_Qwen3-Embedding_serving.py tests/pooling/test_Ernie4_5_reward_serving.py

Accuracy Tests

N/A. This PR only changes metrics reporting location and does not change model output logic.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@529ec9e). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7982   +/-   ##
==========================================
  Coverage           ?   67.73%           
==========================================
  Files              ?      468           
  Lines              ?    65989           
  Branches           ?    10186           
==========================================
  Hits               ?    44700           
  Misses             ?    18441           
  Partials           ?     2848           
Flag Coverage Δ
GPU 77.86% <100.00%> (?)
XPU 7.02% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-03 17:29:03 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

Required 任务当前有 2 个失败、0 个运行中、0 个等待中,暂不建议合入。主测试失败为 PR 代码问题;XPU 8 卡失败表现为 Decode 节点健康检查超时,更像环境/服务启动问题。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
41(0) 41 36 4 0 1 0

2 任务状态汇总

日志列说明:失败任务直接使用 log_links_markdown 字段(已预生成),运行中任务手动拼接 [Job]({html_url})

2.1 Required任务 : 8/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 1h26m PR问题:新增 max_tokens 指标未判空 为 sampling_params 判空再上报 Job -
xpu_8cards_case_test / run_xpu_8cards_cases 18m38s 环境问题:XPU Decode 健康检查超时 环境问题,请 rerun Job -
其余 8 个必选任务通过 - - - - -

2.2 可选任务 — 28/31 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Check PR Template 23s Job -
Trigger Jenkins for PR 7m46s Job -
⏸️ CI_HPU - - -
其余 28 个可选任务通过 - - -

3 失败详情(仅 required)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 测试失败(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 测试失败
  • 置信度: 高
  • 根因摘要: 新增 max_tokens 指标未判空
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试 错误 根因
tests/pooling/test_Qwen3-Embedding_serving.py::test_single_text_embedding AttributeError/HTTP 500 embedding 请求的 sampling_paramsNone
tests/pooling/test_Ernie4_5_reward_serving.py::test_reward_model_with_caching HTTP 500 reward/pooling 请求不携带生成采样参数
tests/engine/test_common_engine.py::test_insert_zmq_task_normal_request_with_worker_pid AssertionError 新 metrics 调用在后续调度/映射逻辑前抛错

根因详情:
PR 将 request_params_max_tokens 等指标从 engine_client.py 移到 common_engine.py。但 Request.from_dict 在存在 pooling_params 时会令 request.sampling_params = None,新增的 request.sampling_params.max_tokens 访问直接触发 AttributeError,导致 embedding/reward 请求返回 500;单测中的 metrics mock 也未覆盖新增指标,异常发生后 trace、pause、worker_pid 等后续逻辑未执行。

关键日志:

File "/workspace/FastDeploy/fastdeploy/engine/common_engine.py", line 1344
  main_process_metrics.obs_value("request_params_max_tokens", request.sampling_params.max_tokens)
AttributeError: 'NoneType' object has no attribute 'max_tokens'

修复建议:

  1. fastdeploy/engine/common_engine.py:1344: 上报 request_params_max_tokens 前先判断 request.sampling_params is not None;pooling/reward/embedding 请求跳过该指标或使用请求字典中的原始 max_tokens 默认值。
  2. tests/engine/test_common_engine.py: 补齐 DummyMetricsprompt_tokens_totalrequest_prompt_tokensrequest_params_max_tokens 的 mock,并增加 sampling_params is None 覆盖。

修复建议摘要: 为 sampling_params 判空再上报

关联变更: fastdeploy/engine/common_engine.py:1339-1344fastdeploy/entrypoints/engine_client.py:371-376
链接: 查看日志

xpu_8cards_case_test / run_xpu_8cards_cases — 超时(置信度: 中)

xpu_8cards_case_test / run_xpu_8cards_cases

  • 状态: ❌ 失败
  • 错误类型: 超时
  • 置信度: 中
  • 根因摘要: XPU Decode 健康检查超时
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试 错误 根因
tests/xpu_ci/8cards_cases/test_pd_21b_ep4tp1.py::test_pd_separation Failed: PD分离服务启动失败 Decode 节点 600 秒内未健康

根因详情:
该 job 的失败发生在 PD EP4TP1 服务启动阶段,健康检查中 P 节点持续 200,但 D 节点从 0 秒到 591 秒一直为 000,最终触发 pytest.fail("PD分离服务启动失败")。日志中未出现主测试的 sampling_params.max_tokens 异常,且后续 EP4TP4 相关 case 能启动通过,因此更偏向 XPU Decode 服务启动/环境偶发问题。

关键日志:

服务健康检查中... 已等待 591 秒,P节点状态码:200,D节点状态码:000
PD分离服务启动超时:经过 10 分钟服务仍未启动!
tests/xpu_ci/8cards_cases/test_pd_21b_ep4tp1.py:285: Failed: PD分离服务启动失败

修复建议:

  1. 环境问题,请 rerun;若复现,优先检查 Decode 节点启动日志和 XPU/RDMA 资源状态。
  2. 关注日志中的 a1_coverage.pth startup hook TypeError 与 loaded_model_signal 缺失告警,确认是否影响 XPU worker 启动。

修复建议摘要: 环境问题,请 rerun

关联变更: 未发现与本 PR 变更文件的直接关联
链接: 查看日志

PaddlePaddle-bot

This comment was marked as outdated.

TBD1
TBD1 previously approved these changes Jun 3, 2026
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-03 20:00:34

📋 Review 摘要

PR 概述:将 prompt_tokens_totalrequest_prompt_tokensrequest_params_max_tokens 三项指标上报点从 engine_client.py(API 进程)迁移至 common_engine.py(主进程)。
变更范围fastdeploy/engine/fastdeploy/entrypoints/tests/engine/
影响面 Tag[Engine] [APIServer]

问题

级别 文件 概述
🟡 建议 tests/engine/test_common_engine.py 新增 mock 属性但缺少对应断言,测试不验证新指标上报行为

🟡 建议 tests/engine/test_common_engine.py — 两个测试用例(with/without trace_carrier)均新增了 prompt_token_ids_lensampling_params mock 属性,但测试体只断言了 trace_set_proc_propagate_context,没有验证 prompt_tokens_totalrequest_prompt_tokensrequest_params_max_tokens 是否被正确调用。测试仅确保不崩溃,不具备守护作用。

建议在 eng._insert_zmq_task_to_scheduler() 调用后补充断言(两个用例均需要):

# 验证新增的指标上报
eng.metrics.prompt_tokens_total.inc.assert_called_once_with(2)
eng.metrics.request_prompt_tokens.observe.assert_called_once_with(2)
eng.metrics.request_params_max_tokens.observe.assert_called_once_with(16)

同时建议补充 sampling_params=None 的边界用例,验证无 sampling_paramsrequest_params_max_tokens 不被调用。

历史 Findings 修复情况

Finding 问题 状态
F1 PR 标题使用了非官方 Tag [Metrics] ⚠️ 仍存在
F2 PR 描述各 Section 内容为空 ✅ 已修复

📝 PR 规范检查

PR 标题使用了 [Metrics],该 Tag 不在 FastDeploy 官方 Tag 列表中,应改用 [Engine]

标题建议(可直接复制):

  • [Engine] Move prompt_tokens_total metrics report to main process
PR 描述建议(点击展开,可直接复制)
## Motivation`prompt_tokens_total``request_prompt_tokens``request_params_max_tokens` 三个指标的上报点从 `engine_client.py`(API 进程)迁移至 `common_engine.py` 主进程的 `_insert_zmq_task_to_scheduler`,使指标在请求真正进入调度器时才被记录,语义更准确,同时消除了从 API 进程向主进程指标对象写入的跨进程依赖。

## Modifications
- `fastdeploy/engine/common_engine.py`:在 `_insert_zmq_task_to_scheduler` 的请求入队处新增 `prompt_tokens_total``request_prompt_tokens``request_params_max_tokens` 三项指标上报
- `fastdeploy/entrypoints/engine_client.py`:删除上述三项指标上报及对应的 `main_process_metrics` import

## Usage or Command
pytest tests/engine/test_common_engine.py tests/pooling/test_Qwen3-Embedding_serving.py tests/pooling/test_Ernie4_5_reward_serving.py

## Accuracy Tests
N/A。本 PR 仅迁移指标上报位置,不影响模型输出逻辑。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests. (已在 test_common_engine.py 中更新测试)
- [x] Provide accuracy results. (N/A,无模型输出变更)
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

代码改动简洁,指标迁移方向正确,逻辑无误。测试新增了必要的 mock 属性,但缺少对应断言,建议补充以提升测试有效性;PR 标题 Tag 建议更换为官方 Tag [Engine]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants